Cepstral compensation by polynomial approximation for environment-independent speech recognition

نویسندگان

  • Bhiksha Raj
  • Evandro B. Gouvêa
  • Pedro J. Moreno
  • Richard M. Stern
چکیده

Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously attempted analytical solutions to the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environmentspecific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or “stereo” recordings of clean and degraded speech. In this paper we introduce an approximation-based method to compute the effects of the environment on the parameters of the PDF of clean speech. In this work, we perform compensation by Vector Polynomial approximationS (VPS) for the effects of linear filtering and additive noise on the clean speech. We also estimate the parameters of the environment, namely the noise and the channel, by using piecewiselinear approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINX-II system and the 100-word alphanumeric CENSUS database. Performance is evaluated at several SNRs, with artificial white Gaussian noise added to the database. VPS provides improvements of up to 15 percent in relative recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computationally Efficient Cepstral Domain Feature Compensation

In this letter, we propose a novel approach to feature compensation performed in the cepstral domain. Processing in the cepstral domain has the advantage that the spectral correlation among different frequencies is taken into consideration. By introducing a linear approximation with diagonal covariance assumption, we modify the conventional log-spectral domain feature compensation technique to ...

متن کامل

Signal Processing for Robust Speech Recognition

This paper describes several new cepstral-based compensation procedures that render the SPHINX-II system more robust with respect to acoustical environment. The first algorithm, phonedependent cepstral compensation, is similar in concept to the previously-described MFCDCN method, except that cepstral compensation vectors are selected according to the current phonetic hypothesis, rather than on ...

متن کامل

Automatic Speech Recognition in GSM Network Using the Bit-Stream and Auxiliary parameters

The Global System for Mobile (GSM) environment includes three main problems for Automatic Speech Recognition (ASR) systems: noisy scenarios, source coding distortion and transmission errors.The second, source coding distortion must be explicitly addressed.The front-end of the speech recognition system combines feature extracted by converting the quantized spectral information of speech coder, p...

متن کامل

Environment normalization for robust speech recognition using direct cepstral comparison

In this paper we describe and evaluate a series of new algorithms that compensate for the effects of unknown acoustical environments (or changes in environment) through the use of compensation vectors that are added to the cepstral representations of speech that is input to a speech recognition system. These compensation vectors are obtained from direct frame-by-frame comparisons of the cepstra...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996